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\  Abstract 

Copy  elimination  is  an  important  optimization  for  implementing  functional  languages.  Though  it  is 
related  to  the  problem  of  copy  propagation  that  has  been  considered  in  many  compilers  and  also  to 
storage  compaction,  the  term  is  used  in  a  more  general  context  where  structured  values  can  be  updated 
and  the  computation  tree  can  be  reordered.  Because  of  these  two  additional  possibilities,  copy  elimination 
is  a  hard  problem,  being  undecidable  in  general. 


We  propose  an  optimization  approach  based  on  abstract  interpretation  which  uses  fixpoint  iteration  for 
computing  address  expressions.  These  address  expressions  supply  the  final  target  for  a  computation, 
eliminating  the  need  to  copy  values  through  intermediate  results.  Our  work  is  in  the  context  of  a  single 
assignment  language  called  SAL.  Our  implementation  has  an  operational  model  for  computing  address 
expressions  by  using  reduction  rules.  Using  this,  we  show  that  copies  present  in  divide  and  conquer 
algorithms  like  bitonic  sort  and  quicksort  can  be  removed.  We  evaluate  the  effectiveness  of  these 


optimizations,  showing  that  in  many  cases,  we  can  come  close  to  the  efficiency  of  an  imperative  language. 
We  present  some  data  on  optimising  some  small  but  tough  benchmarks.  {  - — '  ~ 
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Copy  elimination  with 
Abstract  Interpretation 


K.Gopinath  and  John  L.Hennessy 
Computer  Systems  Laboratory 
Stanford  University 


1.  Introduction 

Copy  elimination  is  an  important  optimization  for  implementing  functional  languages.  Though  it  is 
related  to  the  problem  of  copy  propagation  that  has  been  considered  in  many  compilers  [l],  the  term  is 
used  in  a  more  general  context  where  structured  values  can  be  updated  and  the  computation  tree  can  be 
reordered.  Because  of  these  two  additional  possibilities,  copy  elimination  is  a  hard  problem,  being 
undecidable  in  general.  Copy  propagation  is  adequate  in  imperative  languages  at  intermediate 
representation  level  since  the  programmer  takes  responsibility  for  avoiding  any  unnecessary  copies  in  the 
source  language,  such  copies  being  considered  a  reflection  of  poor  programming.  In  functional  programs, 
however,  the  lack  of  variables  in  the  language  requires  that  a  value  incrementally  different  from  some 
structure  be  expressed  as  a  new  structure.  This  may  involve  a  copy  depending  on  the  program  semantics 
and  the  implementation.  Copy  elimination  also  differs  significantly  from  storage  compaction  where  the 
issue  is  the  computation  of  a  program  in  the  smallest  amount  of  storage.  An  effective  storage  compaction 
scheme  might  avoid  all  the  copies  at  the  same  time  but  likely  to  be  expensive  computationally,  since 
attempt  is  made  to  assign  even  unrelated  names  into  the  same  storage  space,  partially  or  fully,  as  long  as 
they  have  non-overlapping  live  ranges.  The  latter  strategy  is  useful  in  the  case  of  limited  resources  like 
registers  but  minimizing  main  memory  is  of  less  value. 

In  this  paper,  we  will  use  the  technique  of  abstract  interpretation  for  copy  elimination.  Abstract, 
interpretation  is  a  technique  that  was  pioneered  by  C'ottsot  and  Consol  [5j  for  deriving  properties  of 
programs.  Using  this  approach,  Alan  Mycroft  jl2]  considered  the  problem  of  detecting  when  a  call-by¬ 
need  argument  can  be  turned  into  a  eall-by-vnlue  argument,  in  the  interests  of  efficiency.  Iludnk  (flj  in  his 
recent  work  lias  used  the  technique  successfully  to  detect  updates  that  can  he  done  in-place  l»v  reference 
counting.  We  will  use  abstract  interpretation  to  eliminate  copies  also.  This  approach  involves  fixpoint 
iteration  for  computing  <t<ldr<»K  r.cjirrnnii)iin  in  the  presence  of  recursive  functions.  These  address 
expressions  can  he  used  to  compute  the  f'nal  targe'  mMp- -  tv,r  t  eoinp-n  eliminating  intermediate 

copies.  The  insights  gained  hv  this  approach  will  he  used  to  compute  address  expressions  using  rrriiirfi')H 
rult  s  for  a  large  subset  of  a  single  assignment  language  called  S.U..  \\Y  will  i \ , •  ,,  hi  id'  de-  riptioa  of 
S.\|,  alter  describing  lioiv  abstract,  interpretation  can  lie  used  to  eliminate  copies  Tills  approach  has  I, ecu 
used  to  remove  copies  m  divide  and  conquer  piobh-iiis  like  quick-oil  and  l.ilonie  -nil  and  we  report  limes 
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which  are  very  close  to  those  for  imperative  languages  like  Pascal  on  a  set  of  small,  but  tough 
benchmarks. 


2.  Abstract  interpretation 

Let  the  standard  denotation  of  a  function  definition  be  f  »  :  D  — *  R.  Abstraction  functions  are 
then  defined  to  "simplify*  the  domains  of  D  and  R.  If  two  such  functions  Abel:  D  -*  A1  and  AbS:  R  — » 
AH  are  chosen  properly,  the  function  /:£)—►/?  induces  another  interpretation  <  abs-f  »  :  A1  -*  AS 
which  can  be  used  to  answer  some  of  the  properties  of  /.  This  is  the  central  idea  of  abstract 
interpretation.  A  good  example  is  the  function  type  which  abstracts  just  the  type  aspects  of  a  function. 
There  has  been  considerable  literature  on  using  abstract  interpretation  for  computing  strictness  of 
parameters. 

We  use  the  abstraction  Addr  for  both  Abel  and  AbeS  in  our  application  with  both  A1  and  A2  being  the 
domain  of  address  expressions.  The  abstraction  Addr  maps  names  into  their  symbolic  addresses  and 
expressions  into  address  expressions.  Analogous  to  type  expressions,  address  expressions  describe  the 
address  of  an  expression  in  terms  of  the  parameters.  These  are  similar  to  the  effect-declarations  of 
Schwarz  (l3|.  The  mapping  abs-f  gives  the  properties  of  sharing  that  is  possible  between  arguments  and 
the  result  of  a  function. 

3.  Assumptions 

We  assume  a  simple  language  similar  to  the  one  considered  by  Iludak  [9]: 

Program  is  a  set  of  recursive  function  definitions  (without  higher-order  functions) 
body. 

with  main  program  being  a  call  of  /  with  zero  arguments. 

The  predefined  functions  p.  are  as  follows; 

•  standard  arithmetic,  boolean  and  array  selector  operations 

•  conditional:  if  p  then  c  else  a  abbreviated  as 

•  create  array  function:  mka(/wniir/s,  h)  where  bound*  is  an  integer  range  and  li  is  some 
function  which  maps  bound *  to  some  image  set  . 

•  Update  function:  updl  \  i  i'j  where  is  an  allay,  i  is  a  l.gal  index  into  .1  and  i>  is  of  the  same 
type  as  .'i’s  elements  and  its  value  is  same  as  the  array  ,  \  except  at  the  /lit  position  where  it  is 
v. 

•  sequence  function:  seq {A.I.m)  where  .  I  is  an  array  and  /  and  in  arc  legal  indices  into  ,1  and  iis 
meaning  is  same  as  1 1  i < •  subarray  of  .1  stalling  at  index  I  and  ending  at  in.  This  will  he 

abluev  lal  cd  In  .  t  I  ..in 


bThn'a  «>-» !•-«  "a  ‘j.  *-'*  "•*  r-»  *•"  "■*  •ve  "■*  a*  ~-s  -j  - j  - j  -  ■  -  j  -j.  - 
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•  catenate  function:  cat(A,B)  where  A  and  B  are  sequences.  For  purposes  of  optimization  and 
the  non-standard  semantics  that  will  be  shortly  developed,  it  has  the  following  meaning:  if 
,4=a(/..mj  and  B=n[n..p]  then  cat (A,B)  is  a[/..p|  if  m+l=n;  otherwise,  a  newly  created  array 
e[l..m-f+l+p-rH-l]  where  c[l..m-/+l]=a[/..m]  and  c[m-/+2..m-/+l+p-n+l]=a[n..p] 

We  adopt  the  convention  that  an  identifier  in  bold  italics  represents  the  address  expression  for  the 
corresponding  identifier  written  in  italics.  We  adopt  the  same  convention  for  representing  terms  in 
standard  and  non-standard  semantics. 

4.  A  Denotational  model 

We  use  a  denotational  model  for  the  simple  language  to  show  that  fixpoint  computation  of  address 
expressions  is  possible  and  that  they  terminate. 

4.1.  Preliminaries 

We  adopt  a  slightly  modified  version  of  the  notation  in  [9].  Double  angle  brackets  are  used  to  surround 
syntactic  objects,  as  in  E<Cexp~>.  A  new  environment  is  created  by  [ejxx,...tjx^  which  is  abbreviated 
to  [ e./x (.)  when  the  subscript  bounds  are  clear  from  context.  The  notation  A *  — ♦  B  denotes  the  domain 
B+(A  — *  B)+(A  — *  A  B)+  ■  •  ■  We  also  assume  that  all  domains  are  "lifted"  as  necessary,  t.e.,  they  are 

provided  with  an  unique  least  element.  We  will  refer  to  each  of  them  by  _L  instead  of  different  ones  for 
•  each  domain. 

4.2.  Standard  semantics 

bet  D  be  some  suitable  domain  of  basic  values.  For  the  standard  semantics,  we  have  the  following 
domains: 

c,pEC.‘on  (constants  including  primitive  functions) 

£  €  Bv  (hound  variables) 

f  E  Fv  (function  variables) 

body,e  E  Exp  (expressions) 

where  r.\-  r  |  x  |  ,  f  .,)  I  /<' ,  r„) 

pr  E  Frog  (programs) 

where  pr::-- 


„'|k()  M,jx 

A,k  *  ' 

/„(.,: 

n  1 

J  )  1)0(1 1/ 

iik 

Delilie  two  eiiviroiiliienls  ln<i  :■  in  1  fiv  .  lor  bound  and  I'lllielioii  variables  re-peel  ivelv .  Lei  I',  lie  I  lie 

/' 

seinnillie  I'lllielioii  for  giving  me;iiiiiig  to  program.-  ami  /■.’  and  l\  eorr<  .-pondingly  lor  expi-e— ions  and 
eoilslanl-.  The  -'■mailin’  dg'l>in  i-  a-  I'nllows: 
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fve  £  Fvt—Fv  — *  D*  — *  £> 
bve  6  Bve—Bv  — »  D 
Ep:Prog  — *  Fvc 
E.Exp  — *  Fve  —  Bve  — »  D 

K:Con  — »  Fxp*  — *  Fve  — *  Bve  — *  £>  (assumed  given) 

The  semantic  equations  are  as  follows: 

Fp  <  {yf(*1,...,xm)=tod|/.|t:l..n}  »  =/v<  tuhererec  fve=  [  E  <  6ody(.  >  /t«  6ve  /  /f] 

£<c>  fve  bve  —  K  c  ">  fve  bve 
E  <  x  >  fve  bve  =  6ve  <  x  > 

F  <  >  /ue  ive=  /C  <  p  >  (F  <  ej  >  /vs  6ve),  ...  ,(F  <  eft  »  /ve  6ve) 

F  <  /f(e1,...,eri)  >  /ve  6ve—  /ve  <  /.  >  (F  <  et  >  fve  6ve) . (F  <  en  >  /ve  6ve) 

4.3.  Non-standard  semantics 

Given  a  program,  the  non-standard  semantics  derives  a  set  of  recursive  equations  for  the  address 
expressions  of  functions  which  can  then  be  computed  by  fixpoint  iteration. 

Let  G  be  a  set  of  names  for  anonymous  arrays  created  by  mka,  cat,  upd  array  operations  and  also 
arrays  created  by  if  expressions  so  that  there  is  a  i-1  correspondence  between  occurrences  of  these  array 
creating  operations  and  the  set  G.  Let  each  of  these  operations  be  labelled  with  an  integer  value  so  that 
the  symbolic  name  for  the  i  th  occurrence  of  any  of  these  operations  considered  together  is  g..  Also  let 
F[A)  be  the  powerset  of  .4. 
x  —  Addr(x)  for  each  x  6  Bv  Li  G; 

X=  {*|x  €  BvUG) 

G  —  {*|*  €  G) 

RNx  be  the  subscript  range  of  an  array  x;  assume  that  x\RNj  is  always  rewritten  as  x 
A  —  {*[/.. m]|array  x  €  Bv  U  G;  l,m  £  RN  }  -  X;  so  that  A  and  X  a re  disjoint. 

AddrExp—  {!,T}  U  F\X)  U  P(A) 

AddrExp  is  a  pointed  epo  with  the  following  partial  ordering: 

J_  <  —  a  <  =  T,a  €  AddrExp 
a<  =b  if  a  C  6 

JL  and  T  are  the  empty  and  universe  set  respectively.  We  define  two  monotonic  operators  p  and  / »  (used 
for  defining  the  semantics  of  if  and  cat  respectively)  over  the  domain  AddrExp  as  follows: 
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p(a,6,e): 

=  g  3  aa  C  a  &  66  C  6  such  that,  aa  G  P(X)  and  66  G  P[A)  &  vice-versa 

=  0  3  aa  C  a  &  66  C  6  such  that  aa  €  fl(G)  and  66  6  P[X-G)  &  vice-versa 

3aa  Ca  &  66  C  6  such  that  aa={i(/..m]},  66={x[n..p]},  (/^  n  or  m^p) 

=  0  3  aa  C  a  &  66  C  6  such  that  aa={x[/..m]},  66={y[n..pj},  x^y  &.  x,y  €  G 

=  0  3  aa  C  a  &  66  C  6  such  that  p(aa,  bb,g)  G  G 

=a  U  6  otherwise 


v{a,b,g) 

={x(L.pj}  if  a={x[/..m]}  and  6={x(n..pj}  and  n=m+l. 

=  0  3  aa  C  a  &  66  C  6  such  that  aa  G  f(.Y)  and  66  €  F(A)  &  vice-versa 

=  0  3aaCa&66C6  such  that  aa  €  P(G)  and  66  €  P{X-G)  &  vice-versa 

--r,0  3  aa  C  a  &  66  C  6  such  that  aa  =  {x|L.mj},  66={x[n..p]}  &  n  ^  m+1 

=  g  3  aa  C  a  &  66  C  6  such  that  v(aa,  66,  y)  €  G 

=  0  3  aa  C  a  &  66  C  6  such  that  aa={x|/..mj},  66={y[n..p]},  y  &  x,y  6  G 

J_  is  the  bottom  element  for  the  cpo,  signifying  null  information.  If  the  two  operands  are  incompatible, 

the  symbolic  address  for  the  anonymous  array  (passed  as  g)  is  returned. 


We  have  the  following  new  semantic  algebra: 

Loc—AddrExp 
Bve—Bv  —  Loc 
St— Loc  — » D 

fve  G  Fve=Fv-+  Loc *  —  St  — >  (LocXSt) 

EEp.Prog  —  Fve 
EE:Exp  —*  Fve  — *  Loc 
KK.Con  —  Exp*  — *  Fve  —  Loc 

The  semantic  functions  for  the  abstraction  have  some  unusual  features.  They  depend  on  the  standard 
semantics  also.  Hence,  a  product  construction  is  needed  to  describe  the  semantics.  For  simplicity  of 
presentation,  we  just  present  the  non-standard  semantics,  the  product  construction  being  undei stood  We 
will  discuss  how  to  eliminate  this  dependence  on  the  standard  semantics  since  discovering  properties  at 
compile  time  for  optimization  purposes  is  not  possible  otherwise.  In  addition,  we  have  the  problem  ol 
non-termination  in  the  standard  semantics  We  use  the  notation  l<tbf  lor  the  label  coriesponding  to 
expression  r. 

EEp  «  {  . cm)^--b»dy.\i:l..n}  »  fve  whsrerec  fve  EE  <  body.  »  fve  f.\ 

EE  -C  c  »  fve  T 
EE  <  x  »  fve  x 
l\I\  in  i th  bool  I  s  fve  "T 

EE  f{( ! . r  )  fve  fve  t  (  fve  )  1  fve  1 
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KK  <  !f  3*  cond  conseq  alt  /we  ■ 

let 

ty—EE  C  conseq  >  /we 
t2:=EE  <  alt  >  /we 
in 

^vh^lab.J 

end 

K7C  <  cat  >  A[/..mj  £[n..p]  /we 

let 

ll-.—E  <.l’>  fvt  bvt 
mm:—E  <  m  >  /we  6we 
nn~E  <L  n  fve  bve 
pp:=-E  <ji>  /we  6t»e 
tj:— (EE  C  A  »  /we)[/l..mm] 
t0:=(E£  <  /?  >  /we)[nn..pp] 
in 

cat 

end 

KK  <C  mka  3>  bounds  li  /we  == 

JvK  -C  upd  >j4i«  /we  = 
let 

t:—EE  <  .4  >  /we 
in 

if  A  is  not  live  then  t  else  g,  . 


(*liveness  can  be  deduced  by  using  Hudak’s  denotation al  semantics  for  reference 
counting  by  using  a  product  construction;  set:  discussion  below*) 


4.4.  Discussion 

H  or  v  cannot  be  simplified  to  g  if  the  subsets  an  or  66  are  both  from  l\X)  or  l-\A).  Consider  the 
following  function: 

function  Q(A,B:arr):arr  -  if  cond  then  A  else  ii 
Q=\A  B  {A,  B ) 

Since  Q  can  be  called  with  arbitrary  parameters  .1  and  B,  it  is  not  possible  lo  determine  the  address 
expression  of  the  function  body  in  a  form  simpler  than  {A,  B} .  If  this  is  sitnplilied  to  g,  this  may  give  us 
pessimistic  information  about  wliat  can  be  shared  in  ease  (j  is  called  with  actual  parameters  such  that 
A  B.  It  might  be  thought  lliat  the  problem  ran  In-  eliminated  il  functions  ale  in-lined  but  we  see  a 
similar  problem  in  I  lie  billowing  recursive  example  (which  cannot  be  in-lined). 


*  .  «.  * .  *.  ",  *■  ,  - 


>tv -- v  _*■  * \s  ■  •  * v v ^ 
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/(a,6,c)= 

if  condl  then  cat (a, 6) 

elslf  condS  /(upd (a, t. vl)  ,  b,  hl(c )) 

else  /(a,  upd(6.y, v£)  ,  h2(c)) 

f=\abc.  u(a,  b,  g.  .  )  u  /(a,  6,  hl(e))  U  f(a.b,h8(.c )) 

0  cat 

The  expression  in  the  then  part  cannot  be  simplified  in  the  abstract  interpretation  to  g.  If  /  is  called  in 
the  form:  /(A[l..m],  B[m+l..n],c)  with  A=B,  then  we  get  pessimistic  answers.  The  simplification  to  g 
results  in  the  failure  to  propagate  information  across  function  boundaries.  This  is  not  just  a  theoretical 
possibility;  the  bitonic  sort  program  has  this  property.  A  similar  problem  exists  in  strictness  analysis  [8], 

We  need  to  make  some  small  changes  to  Hudak’s  semantics  for  reference  counting:  If  cat(A[/..?n], 
B[n..p])  can  be  updated  in-place  because  A=B  and  n=m+l,  the  reference  count  for  A  does  not  decrease. 
If  this  is  not  the  case,  the  reference  counts  for  both  A  and  B  are  reduced  by  one,  just  as  in  the  standard 
semantics. 

4.5.  Elimination  of  the  standard  semantics 

We  next  discuss  how  to  eliminate  the  standard  semantics  from  the  equations.  Consider  the  equation  for 
cat.  One  of  the  conditions  for  computing  v  is  (n=m  +  l).  Instead  of  evaluating  this  by  using  the  standard 
semantics,  it  can  be  checked  by  symbolic  analysis  if  the  syntactic  expressions  for  n  and  m  arc  of  a 
particularly  simple  nature,  namely  induction  variables.  The  condition  n=m+l  can  be  checked  by 
matching  subtrees  of  «  and  mi.  If  the  expressions  for  n  and  m  do  not  satisfy  the  simple  syntactic  criteria, 
the  value  false  is  returned.  The  abstraction,  in  this  case,  errs  on  the  safe  side.  Also  note  that 
computation  of  ll  and  pp  is  not  necessary.  Similarly,  the  check  (Ij^n  or  m^p)  for  computing  /i  car.  br¬ 
eamed  out  symbolically.  The  new  semantic  equation  for  cat  is  as  follows: 

KK  C  cat  »  A[/..m|  /?(«.. p]  fv c  — 
let 

t^-.—EE  -C  A  ;»  /ue[<S  l  >..<£  m  »-j 
t.,:  =EE  <  B  »  /ve(«  n  ».,«  /<»] 

in 

«”lt 

end 

4.6.  Some  results 

Theorem  1:  For  any  finite  program  prGzFruy  (with  bounded  arrays),  t In-  fixpoints 
corresponding  to  /€  Fv  arc  computable. 

Proof:  AtltlrE.i))  is  a  finite  rpo.  Also,  !\1\  art-  const. r nc 1 1 -< I  from  inoiioionic  operations.  Hence  fve 

can  lx-  cITcVliv.-Sv  compiil  c<|  by  fixpoiul  iteration  starting  \vnb  tin-  bottom  •  -  i  <  -  n  i  ■  1 1  ami  iirraiing  lili  ilic 
least  upper  bound  Is  I’earlieil . 

Theorem  2:  11  lie-  fix  pom  I  solution  of  /  G  Fv  is  some  x  lln-n  tbe  limb  it  lie'  liimtinh  ran 
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be  targeted  to  *  and  x  can  be  updated  in-place  in  the  body  of  /.  to  give  the  result  of  the 
function. 

Proof:The  proof  is  by  structural  induction  on  body..  Since  /.  is  a  fixpoint,  f.(x)—F(f.)(x)—  x. 
Consider  the  following  cases  for  bodyi  in  the  semantic  functions  EE  and  KK\ 

x:  If  body.  =  x,  the  result  is  immediate. 

/»(e1,...,en):  f.  has  *  as  fixpoint  only  if  the  abstraction  of  h  maps  some  argument  e.  into  x.  By  induction 
hypothesis,  h’s  result  can  be  obtained  by  the  update  of  this  argument.  Hence  /.’ s  result  can  also  be 
obtained  from  the  update  of  the  same  argument. 

if [cond, conse  q, alt):  Ji  has  x  as  a  fixpoint  only  if  both  arms  of  the  conditional  map  to  the  same  address 
expression.  From  the  induction  hypothesis,  it  follows  that  the  result  of  the  function  is  given  by  the  update 
of  the  parameter  x  if  cond  is  true  and  also  by  x  when  cond  is  false.  Hence  the  result  is  given  by  update  of 
x. 

cat(A(f.,m],B[n..p]):  /.  has  x  as  fixpoint  only  if  both  A  and  B  are  mapped  to  x  and  n=m+l  with  /  and 
p  as  the  lower  and  upper  bounds  of  x.  Hence  the  result  of  the  function  is  given  by  update  of  x  if  cat 
function  is  in-place  for  this  condition. 

updfAi.u):  Again,  f.  has  x  as  fixpoint  only  if  A  is  mapped  to  x.  If  A  is  not  live  then  .r  can  be  updated 
to  give  the  result  of  /, 

mka:  This  case  cannot  give  rise  to  x  as  a  fixpoint,  hence  vacuously  true. 

This  completes  the  theorem’s  proof  by  structural  induction. 

Theorem  3;  The  abstraction  is  safe. 

Proof:  The  proof  is  by  structural  induction  which  is  omitted. 

4.7.  An  example 

To  illustrate  the  non-standard  semantics  given  above,  consider  the  program  for  reversing  the  elements 
of  an  array: 

function  *ir<ip:J(.\ :  arr ;  Ai ,  .1  j:  nrn  l<  in  ;  i .  integer  ,  j :  integer)  :  an- 
upd  (upd  (.  \ ,  / ,  .■ \ f)  ,j,Ai ) 

function  ,\<  /  aniiiin,  .  t  mr,  t,  »  integer)- 

if  ;/=l  or  i~n/2  then  \ 

else  n  i'  ( .  I  [  ii  4  2 1  ,  \  T  /  -  !  1  .  •-  irri//  ,'(•  1  ,  A/> ,  At/ ,  n-i  +  1  ,  ; )  ,  i  -  1  ,  n  ) 

11  -Il'fl/I  is  to  lie  I  III|)||'II||'I|I  |'||  1 1 1- 1 )  1 . 1 1<  ■  two  I. bill  inli.ll  ]  >;i  III  1 1  •  r-'  .ire  lueileil  Imt.hi-i-  i  >  I  tin-  l.i,  k  nl 
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assignments.  When  the  abstraction  Addr  is  used,  the  new  interpretation  of  the  two  equations  for  at uapS 

and  rev  by  the  non-standard  semantics  is  as  follows: 

function  swapS[A:arr;Ai,  Aj:arre/em;t:integer;/:integer):arr==  {A} 

function  rev(Ap,  Aq:arreltm\A'.arr\i ,  n:integer):arr= 

{A}  U  {ret<A[n-i+2],  A[t-l],  A,  t-1,  n)} 

Since  swaps  occurs  as  a  leaf  in  the  call-graph,  it  is  advantageous  to  find  the  interpretation  of  swaps 
before  rev.  The  new  interpretation  of  stoapS  is  computed  to  be  just  A  by  using  Hudak’s  semantics  for 
reference  counting  whereas  the  interpretation  for  rev  has  to  be  found  by  fixpoint  iteration  as  follows: 

r«v0:=J_ 

reVj—fA}  U  ±:={A} 
rev2:={A}  U  {A}:={A} 

Hence,  abstract  interpretation  of  ret>  under  mapping  Addr  is  just  A.  Hence  the  result  of  the  function  can 
be  given  by  updates  on  the  parameter  A  from  Theorem  2. 

4.8.  Applicability  to  other  functional  languages 

We  discuss  briefly  the  applicability  of  the  approach  we  have  taken  for  copy  elimination  in  the  context  of 
features  that  are  not  present  in  the  simple  language  considered. 

CaU-by-need/call-by-valucflazy  evaluation:  A  perusal  of  the  non-standard  semantics  shows  that  these 
evaluation  mechanisms  make  their  effects  felt  through  the  computation  of  liveness  of  names.  Hudak  and 

Bloss  [2]  have  considered  the  problem  of  computing  the  order  of  evaluation  of  subexpressions  in  the 

context  of  call-by-need  parameter  evaluation  mechanism  and  this  can  be  used  for  determining  liveness. 
Lazy  evaluation,  which  differs  from  call-by-need  in  evaluating  an  expression  more  than  once  if  needed, 
presents  considerable  difficulties.  If  infinite  structures  are  present,  the  structure  is  live  and  lazy  evaluation 
is  needed.  We,  therefore,  need  analysis  to  detect  if  a  structure  necessarily  has  to  be  evaluated  by  lazy 
evaluation  and  analysis  to  convert  lazy  evaluation  to  call-by-need  or  other  forms  of  evaluation. 

Wadler  [1*1]  considers  some  of  these  issues  for  a  limited  class  of  situations  arising  in  practice. 

Higher-order  functions:  Higher  order  functions  could  be  handled  iT  Iludnk’s  semantics  can  be  extended 
to  higher-order  functions  (which  has  not  been  done  yet).  Liven  with  this  assumption,  termination  can  no 
longer  be  guaranteed.  To  see  why,  consider  the  simple  language  extended  with  higher  order  functions. 

The  new  domains  are  a-  follows: 

o,/i  E  T'o/i  (constants  including  primit  ive  functions) 

f  ..lambda  £  !•  i<  (function  variables  with  bodies  and  anonymous  lambda  expressions) 
r,/>  G  Hi'  (bound  variables  including  funct  ion  parameters) 

Imdy.e  E  Exp  (expressions) 

uitcrv  <  ::  f  j  .r  |  /-•(.  j  j  /(.  (;)  |  X.«  • 

hrr  includes  formal  parnmcl  it  s  which  arc  functions  whereas  fn  also  includes  attorn  motis  lambda 
abstractions.  We  need  additional  semantic  rules  to  lake  care  of  l.iinbda  abstractions: 


SRiJKUUUUUUUIIUIlUVUW 
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EE  <  /(.  >  /we  =  /. 

EE  <  Xz.e  >  /we  =  X*.  EE  <  c  »  /we 

EE  <  /(e, . ej  >  /we=(  EE  <  /  >  /we)  (EE  <  ej  >  /we  )  ...(  EE  <  en  >  /we  ) 

The  above  semantic  rules  cause  the  evaluation  of  the  function  part  of  an  application  since  there  could  be 
anonymous  lambda  expressions. 

Let  us  see  why  there  can  be  non-termination.  Consider  the  following  program: 
fih.a.b.c)= 

If  eond  t*?en  h(h:  h(h,a,b,e) ,b,e) 

else  /(/>,upd(o,«4wi) ,  upd(6,/,w£),  hie)) 

Using  the  non-standard  semantics,  we  have  the  following  fixpoint  iteration: 

f0~  1 

fy={h(h,h(h,a,b,c),h,c)}  U  ±_:={h(h,h(h,a,b,L),  6,c)} 

a, 6,c),  fc,e)}  U  {h(h,h(h,a1b,c),b,c)}}  —{h(h,,h(h,a,b,c),  fc,e)} 

Consider  now  the  self-application  of  /  by  substituting  /  itself  for  h.  The  fixpoint  is  not  defined  because  of 
the  non-terminating  computation  involved. 


5.  Copy  elimination  in  SAL 

SAL  is  a  single  assignment  language  defined  at  Stanford  (3]  providing  iteration,  parametric  types  and 
streams  with  scoping  mechanisms  similar  to  Algol  languages. 

We  briefly  describe  some  of  the  constructs  of  the  SAL  language: 

•  simple  let-expression:  allows  for  bindings  of  values  to  names. 

•  array  former:  specifies  an  order-independent  itcra'.on  (/or  all)  which  produces  an  array 

result:  for  be  exp  which  is  similar  to  mka (bounds,  h). 

•  redefining  let-expressions:  implements  iteration  with  loop  dependencies;  the  semantics  are 
vri-v  close  to  what  is  found  in  VAL  and  SISAL.  The  iterative  statements  have  a  set  of  initial 
value  definitions,  a  corresponding  set.  of  redefinitions  used  to  define  new  values  on  every 
iteration,  a  loop  control,  and  a  result  value  defined  in  terms  of  the  linal  values  oi  the  names  in 
th>-  redefinitions.  To  be  able  to  specify  the  redefinitions  without,  any  implied  sequencing,  the 
keyword  old  is  used  to  refer  to  the  value  of  a  name  in  the  previous  iteration.  Every  redefining 
let-expression  can  be  rewritten  using  only  tail-recursive  functions. 

•  constructor  creates  structured  values,  similar  to  aggregates  in  Ada.  Constructors  are  olten 
used  with  a  left  hand  side  consisting  oT  multiple  components  as  in  ML  and  AlgolOS 

•  new  array  ep-nles  a  v.'ildc  which  matches  the  Jirmiiiient  array  hut.  with  one  component  value 
'  hauled:  A:i-  n  which  is  same  ;vs  upd(/l,/,r) 

«  record  field  selector  array  element  selector  ^elreinr  operations  in  reetird*  and  arrays 

•  insert  expression:  i  he  rediiHion  operator  '-imilar  U»  1 1 1 operator  in  1 1* 
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The  atomic  datatypes  are  integer,  boolean  and  reals.  The  structures  present  are  arrays,  records, 
discriminated  unions  and  streams.  For  simplicity  of  presentation,  we  deal  mainly  with  arrays  but  many 
of  the  issues  discussed  typically  arise  in  records  and  discriminated  unions  also. 

5.1.  Computing  Address  expressions  of  SAL  constructs 

The  address  expression  for  a  simple  let-expression  is  easy  to  state  but  redefining  let-expression  are 
difficult  to  handle  in  the  presence  of  optimisations  (sharing  between  new  and  old  values  which  requires  a 
form  of  dependency  analysis  considered  in  vectorising  compilers)  needed  to  implement  it  efficiently.  This 
makes  computation  of  the  address  expression  using  denotational  semantics  difficult.  We  will,  therefore, 
develop  an  alternate  approach  to  tackle  this  problem  using  reduction  rules. 

Constructors  enable  more  than  one  value  to  be  returned.  This  is  especially  useful  for  implementing  non¬ 
locals  by  returning  updated  non-locals  alone  with  the  result  of  the  function  value.  We  need  an  additional 
semantic  equation: 

The  domain  now  has  a  more  complex  cross  product  structure. 

6.2.  Computing  address  expressions  using  reduction  rules 

A  reduction  rule  describes  how  the  address  of  an  arbitrary  expression  can  be  reduced  into  an  address  of 
a  simpler  expression.  Each  reduction  rule  has  two  components,  a  cond  and  a  reduction  rule,  the 
reduction  being  performed  only  if  cond  is  true.  Define  terminal  expression  as  the  irreducible  term  of  an 
expression. 

A  set  of  reduction  rules  is  presented  in  Figures  5-1,  5-2.  These  rules  are  not  complete  and  do  not  take 
into  account  many  subtle  details  t  hat  are  important  in  optimization.  Due  to  lack  «»r  space,  we  also  do  not 
consider  dependent  iterations,  though  >ve  consider  examples  which  have  these  const  ruels. 

5.3.  Examples 

Consider  the  following  program: 

— initialises  all  the  elements  of  an  array 
type  «rr=array  [1 .  .  m]  of  integer, 

function  init(,a  <irr,.r,n  :  integer)  .  urr=if  n~  0  then  a  else  in  it  (.tt :  n->.r  ,  r ,  n- 1) 

Let  IIS  compute  the  address  expression  of  I  lie  function  body . 

Addr(  if  n~  0  then  u  else  in  it  (.a  n->.r  ,  .r  ,  »/*•!)  ) 

=  >if  n~  0  then  Addr  («)  else  Addr(inil  (r/ :  n->,r  ,,i  ,  /i-l)) 

Ib'le.  W  have  ll-ed  |||e  following  sligllll)  dll'll  fell  I  rule  |n|  eolldilioll.il.  ||.|-  |>lir|in-e-,  of  explall.lt  loll 

Addr{i(  innilil mu  then  Y  else  ))  if  inmlil nm  then  Addr{.\  j  else  Addr[)) 

No  mole  |  ed  lie  I  |n|  |-  all1  po-  lb|e  llllles-  We  k  III  iW  I  1 1 . 1 1  III//  fall  1 1  1 1<  h  1 1  ■  ll-  I  1 1  -  I  ,1  Ig1 1 1  l|el  1 1  In  -Hr  l||r  I  I  oil 
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1.  Ad  dr  (if  condition  then  X  else  Y)=>Addr(X),  if  Addr{X)—Addr{Y) 

2.  Let  f=\(a,b,...)  exp  and  /=X.( a, a-exp 

Then  Addr[f[xty,. ..))=>  f[a-exp)\Addr{x)l  a, Addr{y)l  b,... j 

3.  Addr(tot  x:l..h  be  A|*+cj)=  > Addr(A)[/+c../i+e],  e  a  constant 

4.  Addr(tot  «':l..n  be  if  «=1  then  e  else  A[t-1])=>  Addr(A)[l..n],  if  A  is  not  live  and  if 
loop  is  executed  in  the  reverse  order(from  n  to  l)1 

5.  A ddr{ [cxppcxpg,... ,c:rp  J )=  >  \Addr(expl),Addr(expg),...,Addr(expn)] 

6.  Addr(cat(A[/../>],B[Tn..nj))—  >Addr(A)[/..n],  if  m=h+ 1  and  Addr(A)=Addr(B) 

7.  Addr[A-.i~>v)—>  Addr{A),  if  A  is  not  live. 

8.  Addr(A.b:i->  v)—  >  Addr{A.b),  if  A  is  not  live 

9.  Addr[A[i\.j->  ti)=>  Addr(A[i\),  if  A  is  not  live2 

Figure  5-1:  Reduction  Rules  in  the  absence  of  assignments 

1.  Addr( let  definitions  in  value-expression  end)=>  A ddr(value -expression) 

2.  if  an  assignment  A:=exp  is  in  a  simple  let-expression,  then  Addr(A)=  >  Addr(exp) 
unless  the  terminal  expression  of  Addr{exp)  is  a  name  and  it  is  live  in  the  reordering 
chosen  for  the  definitions  and  rest  of  the  program  graph. 

3.  if  a  parallel  assignment  .\:—exp  such  that  Addr(exp)  =>  |rj,...,r.,...]  occurs 

in  a  simple  let-expression  ,  then  for  each  i,  Addr(l.)*=>  Addr(r^  unless  the  termiuai 
expression  of  Addr{r.)  is  a  name  and  it  is  live  in  the  reordering  chosen  for  the 
constructor  and  the  rest  of  the  program  graph. 

Figure  5-2:  Rules  for  Assignments  in  simple  let-expression 


Interprocedural  analysis  is  required  to  compute  this  information  here  and  also  in  more  complicated 
examples  where  non-locals  are  implemented  hy  passing  them  as  parameters.  Proceeding  with  this 
information,  we  derive  the  following: 

=>if  u-0  then  Addr(u)  else  Addriu :  n->.c) 


ALALttLk' 


assuming  Addr(inil  (a ,  i ,  n))  =  >Addr(«) 

=>if  /i=0  then  Addr(tt)  else  Addr(«)  --since  a  is  not  live 
=> Addr(a)  — by  rewriting  conditional 

lienee  we  can  prove  tin-  consistency  of  the  assumption  that  the  fimetion  result 
argument  This  optimization  which  makes  t hi ■  re  nil  and  the  first  argniiu-iit  -I 

^ T III"  I  .  J'l  I  Mi-  lit  I  ill*  •  »»l  :t  i ti< ir •*  r «i I** 

'll  'i  a  ft  i i  •’•rtl  •  >iti  ■  1 1|  >  t  i  |t<  •lil  mi  111*  |  •  It- *  •  t'i »  tv.i  i  <  I  •  I  f  I  it  >  1 1(  I  I  i  j  I  *  •  Itiiilil  t . .  t  !i«  il.ii-,  <  »  ■ 


can  !)»■  with  tin*  liM 

inn*  Moi  au*1  I nrn>  t In*  vain*- 


■ll*  It,  ,*  I  -  III 
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parameter  into  a  var  parameter  -nd  at  the  same  time  converts  the  function  into  a  procedure.  Before  this 
optimization  can  be  implemented,  we  need  to  check  that  the  array  a  is  not  live  in  the  callee  after  its 
transmission  as  a  parameter. 

To  illustrate  the  rules  for  assignment  in  simple  let-expression,  consider  the  following  fragment  from  the 
puzzle  program.  Constructs  that  have  not  been  considered  so  far  have  been  rewritten  using  functions  and 

some  simplifications  also  have  been  effected. 

function  place  (t:  piectTypt ;  /:  position; 

puzzle :  puztype ;  pieceCount :  petype )  :  placetype  = 
let 

temp :  integer :  =claes  [i] 
plim:  position  :=  piece  Max  [i]  ; 
puz:  puztype  :=  letFunction (.puzzle, plim, i,j) 
pc:  petype  :=  pieceCount:  temp  ->  pieceCount  [temp]  - 1; 
result :  position  :  =  leastFunctionCj,  eize,  puz ) 
in  [ result ,  puz,  pc] 
end 

Assume  it  has  already  been  discovered  that  Addr[letFunctton[x,  p,  i,  j])  =>  Addr(x)  and 
Addr(leastFunction(j,  fc,  .x))  =>  Addr(j).  We  would  like  to  target  the  assignments  and  also  find  the 
address  expression  of  the  function  place.  The  first  two  definitions  do  not  cause  any  interesting  storage 
sharing  to  happen.  The  third  definition  can  be  targeted  so  that  Addr(puz)—  >  Addr{puzzle)  since  puzzle 
is  not  live  in  the  function  after  this  use.  No  reordering  is  needed  since  this  is  the  only  use  of  puzzle.  The 
new-array  can  be  evaluated  in-place  since  pieceCount  is  not  live.  Hence,  Addr(pc)=  >  Addr[ pieceCount). 
Finally,  Addr(result)—  >  Addr(j)  from  the  address  expression  for  leastFunction,  since  j  is  not  live  in 
the  function  if  the  assignment  for  puz  is  evaluated  before  the  assignment  for  result.  The  address 

expression  for  the  function  place  is  Addr(\result,  puz,  pc|)  which  can  be  reduced  to  [Addr(j), 
Addr(puzzle),  Addr(piiceC'ounl)].  Notice  that  all  of  them  involve  formal  parameters  of  the  function 
place  and  this  signifies  that  the  function  place  returns  as  result  some  updated  version  of  these  formal 
parameters.  If  copies  are  to  be  eliminated,  these  parameters  ran  lie  changed  into  var  parameteis  once  it 
has  been  shown  that  the  actual  parameters  of  the  function  place  are  not  live  in  any  invoeation.  It  might 
be  advantageous  to  restrict  conversion  into  var  paramnters  to  structured  values  only  to  lessen  some  of  the 
implementation  difficulties  encountered  when  this  is  attempted  for  scalars. 


0.  Fixpoint  computation  using  reduction  rules 

To  extend  the  method  of  computing  address  expressions  using  fixpoint  iteration  developed  lor  the  simple 
language  in  Section  I  to  SAL.  we  need  some  modifications  in  tin-  semantics.  Tin-  most  important  is  the 
megrim*  of  iMinr.s  <>l Iht  1 1 1 ; 1 1 1  |i.»r;i iiii'l <•!>',  l>ofli  lornl  mikI  non-lor;tl. 
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6.1.  Extensions 

The  new  domain  AddrExp  is  given  by  { J_,T  }  U  P(X U  Y)  U  P[A  U  B)  where 
Y={y\y  is  a  local  name} 
r=={y|y  is  a  local  name} 

B=={y[l..m]|y  €  Yj  y  is  an  array  and  l,m  6  R/Vy}  -  Y 
The  operators  p  and  v  have  to  be  extended  with  the  following  cases: 
d{a,b,g) 

=  g  3  (aa  C  a  &  66  C  6)  St  aa  €  P[  Y)  and  bb  6  P(B)  St  vice-versa 

—  g  3  (aa  C  a  St  bb  C  6)  St  aa  €  P[Y)  and  66  €  P(A)  Sc  vice-versa 

—  g  3  (aa  C  a  St  bb  C  b)  St  aa  €  F[X)  St  bb  €  F(B)  &  vice-versa 

=  p(Addr(a),Addr(b),g)  if  a  or  6  €  P(  Y)  U  F[B)  StAddr(a)  or  Addr(b)  can  be  reduced 

=  WPl'.«,-.ff1)|»':l  «]  if  a=(p.|*=l..n],  6=[g.|«=l..n]  St  ff=|yi.|i=l..n] 

i\a,b,g) 

—  v(Addr{a),Addr(b),g)  if  a  or  6  €  P{  Y]  U  P(B)  &  Addr(a)  or  Addr{b)  can  be  reduced 
Since  p  is  associative,  we  will  also  use  it  for  arbitrary  number  of  arguments.  The  domain  AddrExp  still 
remains  a  cpo  and  ensures  that  the  fixpoint  computation  always  terminates. 

Define  Results  of  an  expression  as  follows: 

Results  (let  definitions  in  value-expression  end) -Results  (value-expression) 

Results  (let  initial  definitions 
while/for  cond  do 
redefinitions 
giving  value-expression 
end)  -Results  ( value-expression ) 

ResultsCif  cond  then  c  else  a)=  U(  Results (c)  ,  Results(a)) 

For  other  expressions.  Results ((xpressiou)={r.rpression) 

The  Results  of  a  function  body  returns  all  the  syntactic  expressions  that  are  embedded  in  let-expressions 
or  conditionals  which  may  be  returned  as  the  value  of  the  function. 

A. 2.  Algorithm 

The  first  part  consists  of  the  propagation  of  address  expressions  by  fixpoint  iteration.  The  address 
expression  of  each  function  is  first  set  to  _]_.  For  each  function  (in  a  topological  order,  if  possible), 
compute  p  of  all  the  members  of  the  set  Results!  f. body)  mapped  by  Addr.  Set  the  Addr(f)  equal  to 
this  value  and  propagate  this  information  at  all  the  sites  of  the  function  call  by  provisionally  sharing  all 
the  names  that  are  in  the  same  subsets  in  the  address  expressions.  It  has  to  be  provisional  since  additional 
information  might  cause  some  set  of  sharings  to  be  iiieousistciil  which  then  have  to  be  undone.  Ib-pcat 
until  there  are  no  changes  in  the  Addr  expression  computed  for  eaeh  function. 

Next,  all  the  actuals  roiTe'pondillg  to  tin-  para  Inrl  el' that  "ill  I*C  updated  have  to  be  chi-iked  for  IIOI1- 
liveliess.  Filially .  Ill'-  provisional  sharing  between  r<-ulls  and  paramters  made  in  tin-  pn-vioii-.  niagm  are 
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Let  fLiet  be  the  list  of  functions. 

Set  the  address  expression  of  each  function  f.Addr  to  J_- 
Compute  a  topological  ordering  for  fLitl  if  possible3  . 

change:= true 
while  change  do 
changc:*=tnlMe 

for  each  function  /  €  fLiet  in  the  topological  order  (if  possible)  do 
mapEx^fi^Addtireeulte^  Sc  results.  6  Results(f.body)}) 

diff  :=mapEx~ f.Addr 
if  dif  f  nil  then 
change:— true 
f.Addr:**mapEx 

Propagate  this  new  value  at  ail  the  call  sites  of  / 
endfor 
endwhile 

Figure  0-1:  Algorithm  for  computing  address  expressions  of  functions 


made  final  using  the  information  obtained. 


6.3.  An  example:  counting  permutations 

To  illustrate  an  example  of  fixpoint  computation  for  address  expressions,  we  present  the  function 
permute  which  counts  the  number  of  permutations  of  the  elements  of  an  array  (Figure  6-2).  The 
Results  of  the  function  body  of  permute  is  given  by  {  \A2,  pctr2  +  1]  ,  [A,  ctr  +  l]  }.  Mapping 
Results  by  Addr  and  computing  mapEx  gives  the  address  expression  as  [{A}  U  {Al},  {ctr}  U 
{pctr}].  Here  we  have  used  the  information  that.  A£—>A1  and  pctr£—  s  pctr.  This  information  has 
to  be  propagated  provisionally  at  all  the  sites  of  the  function  call:  hence  A1  shares  storage  with  A  and 
pclr  with  ctr.  Similarly  A3  shares  storage  with  A  .'(through  old  At).  Furthermore,  A2  shares  storage 
with  Ai,  pctr  with  pctr2.  If  we  now  compute  the  Addr  expression  of  the  function  body  of  permute,  all 
the  Results  get  reduced  to  ({A},  {ctr}].  We  now  have  found  the  fixpoint  for  the  address  expression  of 
the  function  permute. 

6.4.  Divide  and  conquer  problems 

A  very  important  subcase  of  intcrprocedural  analysis  occurs  in  divide  and  conquer  algorithms.  Such 
algorithms  have  significant  opportunity  for  parallelism,  but  straightforward  implementations  may 
produce  a  significant  number  of  copies,  thus  losing  any  performance  advantage. 


Consider  a  schema  for  divide  and  conquer  problems  using  arrays  as  the  data  structure  and  assumiti}: 
that  the  results  can  he  combined  by  simple  catenation  (using  the  function  cat) 


1  Im'it  :ir»-  irifhi'  il'l'-  k.~.  :•  n  v  unkrinj;  wonM  «|t»  l<nf  it<  <•  itii"hi  i 


—WHWBW 
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type  permType-r^ cord  X:arr;  Y:  integer  end 

function  permute  (  n :  integer ;  A :  arr ;  ctr :  integer)  :  permType- 
if  n  <>  1  then 
let 

AS:  arr;  pctrS: integer ; 

[AS,  pctrS]  := 

let 

A1 :  arr ;  pctr :  integer ; 

[Al,  pctr]  :=  permu<e(n-i,  A,  ctr); 

Ar:  Integer  :=  n-1 
-while  k  >=  1  do 

k  :=  old  fc  -  1; 

AS:  arr; 

[AS,  pctr]  :=  permute  (n-1  .swap  (old  Al,n,  old  k)  ,  old  pctr); 

Al  awap(AS,  n,  old  k) 
giving  [Al.  pctr ] 
end 

in  [AS,  pctrS  +  1] 
end 

else  [A,  ctr  +  1] 
end; 

Figure  5-2:  Program  to  compute  the  number  of  permutations  of  an  array 


f(A :  array  [1 .  .  /]  of  T)  :  = 
if  1=1  then  h(A) 

else  cat(/(for  j:l../’be  A[i]),  /(for  t :  1 .  .1-1'  be  A  [/’+»])) 

Since  the  base  recursion  involves  the  single  element  of  the  array,  the  function  body  can  be  evaluated 
without  using  more  than  0(1 )  space,  if  it  can  be  proved  that  there  is  no  overlap  in  the  parameters  passed 
to  the  recursive  calls  of  /.  That  is,  there  is  possibility  of  sharing  amongst  the  parameters  among 
successive  calls  to  /,  if  we  show  that  there  is  no  overlap  in  the  elements  accessed  by  the  array  former  in 
the  argument.  Realization  that  this  operation  can  be  done  in  place  leads  to  a  very  efficient  program  since 
the  non-optitnized  program  has  to  allocate  and  deallocate  arrays  at  each  level  of  the  recursion  in  addition 
to  the  copy  of  the  result  from  one  level  of  the  recursion  to  the  next.  Furthermore,  ir  we  assume  that,  cat 
does  not  allocate  storage  or  perform  a  copy  if  the  objects  are  already  adjacent,  then  we  can  eliminate  the 
temporary  storage  and  copying  on  return. 

Given  a  set  of  recursive  and  11011-recursive  functions  which  implement  a  divide  and  conquer  algorithm, 
we  need  to  find  out  whether  in-place  modification  of  the  datn-st  met  tire  is  possible  and  safe.  To  do  this, 
we  need  to  find  out  how  input,  dala-slnicliire  is  subdivided  and  bow  the  result  is  composed  Computing 
the  address  expression  for  each  function  delinition  gives  exactly  this  information  These  address 
expressions  describe  how  the  address  expression  of  a  function  result  is  defined  in  terms  of  tin-  address 
expressions  ol  Its  arglllm-nts.  The  fix-points  for  the  addresses  of  the  functions  are  computed  by  iteration 
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and  by  use  of  the  reducing  rules  to  simplify  the  expressions  of  the  function  bodies.  However,  type 
parameters  make  it  possible  to  create  type  uncheckable  expressions  by  using  non-terminating  function 
applications  and  looping  constructs  when  defining  index  ranges  for  arrays.  This  may  prevent  address 
expressions  to  be  computed.  If  index  ranges  and  indices  are  assumed  to  be  of  a  restricted  syntactic  form( 
for  example,  linear  induction  variables)  for  which  symbolic  analysis  is  helpful  in  determining  simple 
algebraic  properties  and  identities,  the  address  expressions  may  be  computable.  This  process  is  most 
easily  understood  from  the  viewpoint  of  an  example  (Figure  8-3). 


— n  is  a  power  of  2 

type  orr(n: integer)  =  array [l..n]  of  integer; 

function  reverse  (X:  arr(n)  )  :arr(n)= 
if  n=l  then  X 
else 

let  A:arr(n):~X 
for  * :  1 .  .  n/2  do 

A :  =  (old  A : »->old  A[n-i+l])  ->old  A[i] 

giving  A 
end; 

function  sortbitontc (.X :  arr (n)) : arr  (n)  = 
if  n=l  then  X 
else 

let  A:arr(n)  :=X 
for  i :  1 .  .  n/2  do 

A:=if  old  A[i]  <  old  A[j+u/2]  then  old  A 
else  (old  A;t->old  A[i+n/ 2]): 

(»+n/2)->old  A[j] 

giving 

let 

lower:  arr  (n/2)  for  t:l..n/2  be  A[»3; 
upper :  arr  (n/2)  ;=  for  »':l..n/2  be  A[i+n/2]; 
in  cat  (sortbilontc  (lower)  .  sortbilonic  (upper)) 

end 

end 

function  mergcXX  :arr(n)  ;Y:arr(m))  :arr(n*  t»i)  = 
sortbitontc  (cat(A\  reverse  O') ) )  ; 

function  sort  (X :  nrr(n)  )  :  arr(n)  = 

if  »=1  then  .V 
else 
let 

lower  :<trr(n/ 2):=  for  i:l../i/2  be  .\  [/]  ; 
upper : arr (ti/ 2):=  for  i:  1 .  .n/2  be  ,\  [i+n/2]  ; 
in  merge  (sort  (lowt  r)  ,  sort  (nppt  r)  ) 

end 

•  Figure  0-3:  liilonir  ><»rl 
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Consider  the  function  sort.  It  is  clear  that  AT  can  be  sorted  in  place  without  any  additional  array  storage 
or  any  copies.  Determining  that  this  is  so  in  a  compiler  is  a  non-trivial  matter.  This  depends  on  detecting 
the  following  in  the  process  of  proving  consistency. 

1.  lower  and  upper  are  nonoverlapping  (i.e.,  l..n/2  and  n/2+l..n  are  non-overlapping) 

2.  the  parameter  X  and  the  result  of  the  function  sort  can  share  the  same  storage. 

The  last  condition  is  the  most  complex.  We  show  the  initial  equations  that  are  written  for  the  address 
expressions,  and  the  simplification  of  the  equations.  Note  that  we  require  other  optimisation  steps  to 

occur  to  achieve  the  desired  results.  These  steps  are  noted. 

reverse 

=>XX ■  X  U  Addr  (redefining  let-expression) 

=>XX.  XU  A 
=>XX.  X  U  X 

=>XX.  X 

Hence  reverse(X)  can  be  done  without  using  extra  arrays.  Now,  consider  the  function  sortbitonxe  - 

abbreviated  as  si. 

ah 

=>XX.X  U  A ddr (redefining  let-expression) 

~>XX.X  U  A  ddr  (simple  let-expression) 

=>XX.X  U  Addr (cat (sb (lower)  ,  ab (upper ))) 

=>XX.X  U  v (Addr (sb (lower)  )  .  Addr(s6  (upper)  )  ,  glai  ) 

CM 

=>XX.X  U  i/(«6(A[l .  .  n/2] )  ,  «6(A[n/2+l .  .n])  ,  ) 

CM 

=>XX.X  U  |/(«6(X[1. -n/2]),  s6(A'[«/2+l ..»»]),  g.  .  ) 

‘“"cat 

— coalescing  of  A  ft  X 

The  fixpoint  iteration  is  as  follows: 
s60:=XX.J_ 

sij—XX.Xu  _]_:=XX.X 

«60.=XX.Xu  i^X|l..n/2],  XJn/2-H..«|,p/ot  ):=X 

cat 

Hence,  Addr(X)  is  a  fixpoint  solution  for  Addr{sb(X)),  and  similarly,  Addr(A')  is  a  fixpoint  solution  for 

■- J- 

Addr(sort(X)) 

From  the  above  analysis,  we  can  conclude  that  *or/(A)  can  be  accomplished  in  place.  Note  that  this 

process  is  both  subtle  and  easily  negated:  if  the  function  reverse  is  defined  in  a  different  way,  say: 
for  / : 1 .  .  »  be  A’[m-»+1] 

then  an  addit  ional  array  is  needed  and  one  cannot  conclude  t  hat.  *orl  can  be  done  in-place. 
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7.  Experimental  results 

A  compiler  for  a  substantial  part  of  SAL  has  been  implemented  to  verify  the  effectiveness  of  the 
approach.  This  compiler  generates  an  intermediate  code  called  UCODE  and  has  many  other  aspects  we 
have  not  discussed.  They  are  fully  covered  in  a  thesis  [7]. 

The  timings  for  the  following  programs  on  a  MicroVax-II  (without  counting  the  output  times  except 
when  negligible)  were  collected  using  the  UNIX  time  command. 

•  Bubblesort:  sorts  an  array  of  1000  elements. 

•  Life  program:  500  iterations  on  a  board  10  by  10  (with  border  12  by  12) 

•  Matrix  multiply:  of  two  100  by  100  integer  matrices 

•  8  queens:  Finds  all  the  92  solutions. 

•  pussle:  finds  the  solution.  This  is  a  very  highly  recursive  and  computationally  demanding 
program  for  solving  a  three-dimensional  puzzle.  Often  used  for  benchmarking  C  and  other 
languages  on  workstations. 

•  quicksort:  sorts  the  same  array  as  the  bubblesort  does(1000  elements) 

•  bitonic  sort:  sorts  an  array  of  1024  elements(has  to  be  a  power  of  2) 

•  perm,  counts  the  number  of  permutations  of  an  array  of  7  elements.  This  is  iterated  5  times. 

•  cyk:  the  Cocke- Younger-Kasami  algorithm  parses  an  input  string  or  128  n's  for  the  following 
ambiguous  grammar: 

A  — *  a 
A  —  A  A 

•  SIMPLE:  transliteration  of  the  NEWRZ  program,  used  in  hydrodynamic  calculations, 
considered  by  Ellis  [6] 

The  user  execution  ti»ie.s( without  I/O)  in  seconds  are  given  in  Table  7-1.  The  various  optimization  levels 
are  as  follows: 

•  No  Opt:  No  optimization  was  done 

•  Opt  1 :  All  optimizations  with  no  rangeclieck  elimination. 

•  Opl2:  All  optimizations  with  rangechcck  elimination  by  analysis. 

•  Opttt:  All  optimizal  ions  with  rniigceliccking  turned  off 

•  Opl-I:  All  optimizations  plus  some  very  simple  optimization  on  l 'COPE  generated  by 
inspection  of  code  (peep-hole  optimization  in  all  the  examples  and  I  invariant  moved  in 

and  matrix  in  a  lli  pi  if) 

•  pe  -O;  Kxerulion  time  lor  I’erkeley  I’asral  at  its  highest  optimization  level 
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1 

INo  Opt 

lOptl 

|0pt2 

I0pt3 

j  0pt4 

1  pc  -0 

IJttine 

I  Bsort 

11913.2 

126.6 

117.5 

117.5 

15. 8(*) 

114.9 

1  39% 

iLife 

123.4 

122.6 

118.4 

118.4 

114.5 

18.2 

1177% 

1  am 

172.6 

182.2 

148.2 

148.2 

127.7 

132.3 

1  86% 

18 

11.3 

11.3 

11.2 

11.0 

10.8 

13.4 

1  23% 

1  simple 

119.9 

|1.6 

11.6 

11.2 

11.0 

11.1 

1  91% 

leyk 

158.0 

156.9 

141.9 

139.0 

130.1 

116.1 

1187% 

Ipuzz 

1393.6 

132.6 

130.7 

124.0 

118.8 

115. 8(+) 

1119% 

1  quick 

112.5 

12.8 

12.8 

11.5 

11.2 

11.3 

1  92% 

1  bitonic 

114.3 

12.9 

12.9 

12.5 

12.2 

11.6 

1138% 

Ipera 

15.5 

13.5 

|3.5 

12.5 

12.3 

12.5 

1  92% 

(*) :  Time  with  UOPT 

(+) :  The  Pascal  version  is  faithful  to  the  SAL  version;  if  this  is  not 
attempted,  we  get  a  time  of  i3.3s 

Table  7-1:  Execution  times  of  benchmarks  in  SAL  and  Berkeley  Pascal. 

We  list  also  %time  which  is  Opt4/pc  -O. 


It  must  be  mentioned  that  UOPT  [4],  which  is  a  UCode  to  UCode  optimizer,  could  not  be  used  except 
for  bubble8ort.  Hence,-  there  is  substantial  possibility  for  improvement  in  the  execution  times  by  use  of 
register  allocation,  peephole  optimization  and  other  standard  optimizations  considered  in  compilers  for 
imperative  languages  (l) .  The  execution  times  for  bubbleeort  with  and  without  UOPT  are  instructive.  We 
believe  that  the  timings  could  be  improved  by  as  much  as  50^  or  more  with  an  UCODE  to  UCODE 
optimizer. 


To  give  another  idea  for  the  possibilities  for  improvement,  we  have  optimised  the  UCODE  generated  by 
looking  just  for  the  simplest  store-load  peephole  optimizations.  Optf  in  the  table  refers  to  peephole 
optimization  done  by  inspection.  In  2  cases  (puzzle,  matrix  multiply)  one  invariant  has  been  moved  out  of 
the  loops.  Even  with  all  these  handicaps,  it  is  remarkable  that  we  report  execution  times  for  six  out  of 
ten  programs  better  than  the  best  timings  for  Berkeley  Pascal  (pc  -O).  The  program  for  life  and  cyk 
suffer  because  of  the  inability  to  define  arrays  partially  in  the  SAL  language. 


8.  Conclusions 

It  is  very  gratifying  to  see  fairly  theoretical  approaches  like  abstract  interpretation  have  such  a  direct 
hearing  on  critical  issues  like  copy  elimination. 
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